On the Relevance of Syntactic and Discourse Features for Author Profiling and Identification
نویسندگان
چکیده
The majority of approaches to author profiling and author identification focus mainly on lexical features, i.e., on the content of a text. We argue that syntactic dependency and discourse features play a significantly more prominent role than they were given in the past. We show that they achieve state-of-the-art performance in author and gender identification on a literary corpus while keeping the feature set small: the used feature set is composed of only 188 features and still outperforms the winner of the PAN 2014 shared task on author verification in the literary genre.
منابع مشابه
Author gender identification from text using Bayesian Random Forest
Nowadays high usage of users from virtual environments and their connection via social networks like Facebook, Instagram, and Twitter shows the necessity of finding out shared subjects in this environment more than before. There are several applications that benefit from reliable methods for inferring age and gender of users in social media. Such applications exist across a wide area of fields,...
متن کاملA Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure
Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...
متن کاملThe Impact of Recasts on the Syntactic Accuracy of Iranian EFL University Students’ Oral Discourse
Among the major issues raised by classroom SLA researchers is the debate on the degree to which teacher’s or learner’s attention should be directed to linguistic features. However, one of the relevant variables in corrective feedback studies which seem to be less operationalized is the differential impact of different types of feedback on the accuracy of the oral performance of the participants...
متن کاملSTANCE AND ENGAGEMENT DISCOURSE MARKERS IN JOURNAL’S “AUTHOR GUIDELINES”
Over the past decade, there has been an increasing interest in the study of interactional metadiscourse markers in different contexts. However, not much research has been conducted about the discourse of journal author guidelines, especially the use of meta-discourse markers in this genre. Therefore, this corpus-based study had three main aims: 1) to delve deep into the types, frequencies and f...
متن کاملIranian Advanced EFL Learners’ Awareness and the Use of Marked Word Order: Discourse-pragmatically Motivated Variations
The present investigation was designed to study the production and comprehension of specific means for information highlighted by advanced Iranian learners of English as a Foreign Language. The study focused on the discourse-pragmatically motivated variations of the basic word order such as inversion, pre-posing, it- and Wh-clefts. After taking the Nelson test, a homogeneous group was settled. ...
متن کامل